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LT 2 C 2 : A language of thought with Turing-computable Kolmogorov 

complexity 

Sergio RomanoP 1 Mariano Sigmanp® Santiago FigueirsP^f 

In this paper, we present a theoretical effort to connect the theory of program size to 
psychology by implementing a concrete language of thought with Turing-computable Kol- 
mogorov complexity (LT 2 C 2 ) satisfying the following requirements: 1) to be simple enough 
so that the complexity of any given finite binary sequence can be computed, 2) to be based 
on tangible operations of human reasoning (printing, repeating,. . .), 3) to be sufficiently 
powerful to generate all possible sequences but not too powerful as to identify regulari- 
ties which would be invisible to humans. We first formalize LT 2 C 2 , giving its syntax and 
semantics and defining an adequate notion of program size. Our setting leads to a Kol- 
mogorov complexity function relative to LT 2 C 2 which is computable in polynomial time, 
and it also induces a prediction algorithm in the spirit of Solomonoff's inductive infer- 
ence theory. We then prove the efficacy of this language by investigating regularities in 
strings produced by participants attempting to generate random strings. Participants had 
a profound understanding of randomness and hence avoided typical misconceptions such as 
exaggerating the number of alternations. We reasoned that remaining regularities would 
express the algorithmic nature of human thoughts, revealed in the form of specific patterns. 
Kolmogorov complexity relative to LT 2 C 2 passed three expected tests examined here: 1) 
human sequences were less complex than control PRNG sequences, 2) human sequences 
were not stationary, showing decreasing values of complexity resulting from fatigue, 3) each 
individual showed traces of algorithmic stability since fitting of partial sequences was more 
effective to predict subsequent sequences than average fits. This work extends on previous 
efforts to combine notions of Kolmogorov complexity theory and algorithmic information 
theory to psychology, by explicitly proposing a language which may describe the patterns 
of human thoughts. 
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I. Introduction 

Although people feel they understand the concept 
of randomness 1 , humans are unable to produce 
random sequences, even when instructed to do 
so [2H6] , and to perceive randomness in a way that 
is inconsistent with probability theory [7Hl0j. For 
instance, random sequences are not perceived by 
participants as such because runs appear too long 
to be random [TTlfPT] and, similarly, sequences pro- 
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duced by participants aiming to be random have 
too many alternations [T31H3] ■ This bias, known 
as the gambler's fallacy, is thought to result from 
an expectation of local representativeness (LR) of 
randomness [10] which ascribes chance to a self- 
correcting mechanism, promptly restoring the bal- 
ance whenever disrupted. In words of Tversky and 
Kahneman [3] , people apply the law of large num- 
bers too hastily, as if it were the law of small num- 
bers. The gambler's fallacy leads to classic psycho- 
logical illusions in real- world situations such as the 
hot hand perception by which people assume spe- 
cific states of high performance, while analysis of 
records show that sequences of hits and misses are 
largely compatible with Bernoulli (random) pro- 
cess jlMIS]. 

Despite massive evidence showing that percep- 
tion and productions of randomness shows system- 
atic distortions, a mathematical and psychologi- 
cal theory of randomness remains partly elusive. 
From a mathematical point of view — as discussed 
below — a notion of randomness for finite sequences 
presents a major challenge. 

From a psychological point of view, it remains 
difficult to ascribe whether the inability to produce 
and perceive randomness adequately results from 
a genuine misunderstanding of randomness or, in- 
stead, as a consequence of the algorithmic nature 
of human thoughts which is revealed in the forms 
of patterns and, hence, in the impossibility of pro- 
ducing genuine chance. 

In this work, we address both issues by devel- 
oping a framework based on a specific language of 
thought by instantiating a simple device which in- 
duces a computable (and efficient) definition of al- 
gorithmic complexity [TTh19| . 

The notion of algorithmic complexity is de- 
scribed in greater detail below but, in short, it as- 
signs a measure of complexity to a given sequence 
as the length of the shortest program capable of 
producing it. If a sequence is algorithmically com- 
pressible, it implies that there may be a certain 
pattern embedded (described succinctly by the pro- 
gram) and hence it is not random. For instance, the 
binary version of Champernowne's sequence |20| 

01101110010111011110001001101010111100 . . . 

consisting of the concatenation of the binary rep- 
resentation of all the natural numbers, one after 
another, is known to be normal in the scale of 2, 



which means that every finite word of length n oc- 
curs with a limit frequency of 2~ n — e.g., the string 
1 occurs with probability 2 _1 , the string 10 with 
probability 2~ 2 , and so on. Although this sequence 
may seem random based on its probability distri- 
bution, every prefix of length n is produced by a 
program much shorter than n. 

The theory of program size, developed si- 
multaneously in the '60s by Kolmogorov (17j . 
Solomonoff 21 and Chaitin 22], had a major influ- 
ence in theoretical computer science. Its practical 
relevance was rather obscure because most notions, 
tools and problems were undecidable and, overall, 
because it did not apply to finite sequences. A 
problem at the heart of this theory is that the com- 
plexity of any given sequence depends on the chosen 
language. For instance, the sequence 

xi = 1100101001111000101000110101100110011100 

which seems highly complex, may be trivially ac- 
counted by a single character if there is a symbol 
(or instruction of a programming language) which 
accounts for this sequence. This has its psycholog- 
ical analog in the kind of regularities people often 
extract: 

x 2 = 1010101010101010101010101010101010101010 

is obviously a non-random sequence, as it can suc- 
cinctly be expressed as 

repeat 20 times: print '10'. (1) 

Instead, the sequence 

x 3 = 0010010000111111011010101000100010000101 

appears more random and yet it is highly compress- 
ible as it consists of the first 40 binary digits of 
7r after the decimal point. This regularity is sim- 
ply not extracted by the human- compressor and 
demonstrates how the exceptions to randomness re- 
veal natural patterns of thoughts [33] . 

The genesis of a practical (computable) algorith- 
mic information theory [24) has had an influence 
(although not yet a major impact) in psychology. 
Variants of Kolmogorov complexity have been ap- 
plied to human concept learning [25], to general 
theories of cognition |26] and to subjective random- 
ness [23J12Z]- In this last work, Falk and Konold 
showed that a simple measure, inspired in algorith- 
mic notions, was a good correlate of perceived ran- 
domness [37] • Griffiths & Tenenbaum developed 
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statistical models that incorporate the detection of 
certain regularities, which are classified in terms of 
the Chomsky hierarchy [23 . They showed the exis- 
tence of motifs (repetition, symmetry) and related 
their probability distributions to Kolmogorov com- 
plexity via Levin's coding theorem (cf. section fVII. I 
for more details). 

The main novelty of our work is to develop a class 
of specific programming languages (or Turing ma- 
chines) which allows us to stick to the theory of pro- 
gram size developed by Kolomogorov, Solomonoff 
and Chaitin. We use the patterns of sequences of 
humans aiming to produce random strings to fit, for 
each individual, the language which captures these 
regularities. 

II. Mathematical theory of random- 
ness 

The idea behind Kolmogorov complexity theory is 
to study the length of the descriptions that a formal 
language can produce to identify a given string. All 
descriptions are finite words over a finite alphabet, 
and hence each description has a finite length — or, 
more generally, a suitable notion of size. One string 
may have many descriptions, but any description 
should describe one and only one string. Roughly, 
the Kolmogorov complexity |17) of a string x is the 
length of the shortest description of a;. So a string 
is 'simple' if it has at least one short description, 
and it is 'complex' if all its descriptions are long. 
Random strings are those with high complexity. 

As we have mentioned, Kolmogorov complexity 
uses programming languages to describe strings. 
Some programming languages are Turing complete, 
which means that any partial computable function 
can be represented in it. The commonly used pro- 
gramming languages, like C++ or Java, are all Tur- 
ing complete. However, there are also Turing in- 
complete programming languages, which are less 
powerful but more convenient for specific tasks. 

In any reasonable imperative language, one can 
describe xi above with a program like (fT|), of length 
26, which is considerably smaller than 40, the size 
of the described string. It is clear that x^ is 'sim- 
ple'. The case of x^ is a bit tricky. Although at 
first sight it seems to have a complete lack of struc- 
ture, it contains a hidden pattern: it consists of 
the first forty binary digits of 7r after the decimal 



point. This pattern could hardly be recognized by 
the reader, but once it is revealed to us, we agree 
that xz must also be tagged as 'simple'. Observe 
that the underlying programming language is cen- 
tral: X3 is 'simple' with the proviso that the lan- 
guage is strong enough to represent (in a reasonable 
way) an algorithm for computing the bits of ir — a 
language to which humans are not likely to have 
access when they try to find patterns in a string. 
Finally, for x% , the best way to describe it seems to 
be something like 

print '1100101001111 0001 01 0001 10101100110011100', 

which includes the string in question verbatim, 
length 48. Hence x% only has long descriptions and 
hence it is 'complex'. 

In general, both the string of length n which al- 
ternates 0s and Is and the string which consists of 
the first n binary digits of it after the decimal point 
can be computed by a program of length « log n — 
and this applies to any computable sequence. The 
idea of the algorithmic randomness theory is that 
a truly random string of length n necessarily needs 
a program of length w n (cf. section [IT] for details). 

i. Languages, Turing machines and Kol- 
mogorov complexity 

Any programming language C can be formalized 
with a Turing machine Mc, so that programs of C 
are represented as inputs of Mc via an adequate 
binary codification. If C is Turing complete then 
the corresponding machine Mc is called universal, 
which is equivalent to say that Mc can simulate 
any other Turing machine. 

Let {0, 1}* denote the set of finite words over 
the binary alphabet. Given a Turing machine M, 
a program p and a string x (p, x £ {0, 1}*), we say 
that p is an M-description of x if M(p) — x - 
i.e., the program p, when executed in the machine 
M, computes x. Here we do not care about the 
time that the computation needs, or the memory 
it consumes. The Kolmogorov complexity of x £ 
{0, 1}* relative to M is defined by the length of the 
shorter M-description of x. More formally, 

Km(x) = min{[p| : M(p) — x} U {oo}, 

where \p\ denotes the length of p. Here M is any 
given Turing machine, possibly one with a very spe- 
cific behavior, so it may be the case that a given 
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string x does not have any M-description at all. In 
this case, M(x) = oo. In practical terms, a machine 
M is a useful candidate to measure complexity if it 
computes a surjective function. In this case, every 
string x has at least one M-description and there- 
fore Km{x) < oo. 

ii. Randomness for finite words 

The strength of Kolmogorov complexity appears 
when M is set to any universal Turing machine U. 
The invariance theorem states that Kjj is minimal, 
in the sense that for every Turing machine M there 
is a constant cm such that for all x £ {0, 1}* we 
have Kjj{x) < Km(c) + cm- Here, cm can be seen 
as the specification of the language M in U (i.e., the 
information contained in cm tells U that the ma- 
chine to be simulated is M). If U and U' are two 
universal Turing then Kjj and Kw differ at most 
by a constant. In a few words, Ku(x) represents 
the length of the ultimate compressed version of x, 
performed by means of algorithmic processes. 

For analysis of arbitrarily long sequences, cm be- 
comes negligible and hence for nonpractical aspects 
of the theory the choice of the machine is not rel- 
evant. However, for short sequences, as we study 
here, this becomes a fundamental problem, as no- 
tions of complexity are highly dependent on the 
choice of the underlying machine through the con- 
stant cm- The most trivial example, as referred in 
the introduction, is that for any given sequence, say 
xx, there is a machine M for which x\ has minimal 
complexity. 

iii. Solomonoff induction 

Here we have presented compression as a frame- 
work to understand randomness. Another very in- 
fluential paradigm proposed by Schnorr is to use 
the notion of martingale (roughly, a betting strat- 
egy), by which a sequence is random if there is 
no computable martingale capable of predicting 
forthcoming symbols (say, of a binary alphabet 
{0,1}) better than chance [28l[29]. In the 1960s, 
Solomonoff [H] proposed a universal prediction 
method which successfully approximates any dis- 
tribution /i, with the only requirement of /i being 
computable. 

This theory brings together concepts of algorith- 
mic information, Kolmogorov complexity and prob- 



ability theory. Roughly, the idea is that amongst 
all 'explanations' of x, those which are 'simple' are 
more relevant, hence following Occam's razor prin- 
ciple: amongst all hypothesis that are consistent 
with the data, choose the simplest. Here the 'ex- 
planations' are formalized as programs computing 
x, and 'simple' means low Kolmogorov complexity. 

Solomonoff's theory, builds on the notion of 
monotone (and prefix) Turing machines. Mono- 
tone machines are ordinary Turing machines with 
a one-way read-only input tape, some work tapes, 
and a one-way write- only output tape. The out- 
put is written one symbol at a time, and no eras- 
ing is possible in it. The output can be finite if 
the machine halts, or infinite in case the machine 
computes forever. The output head of monotone 
machines can only "print and move to the right" so 
they are well suited for the problem of inference of 
forthcoming symbols based on partial (and finite) 
states of the output sequence. Any monotone ma- 
chine N has the monotonicity property (hence its 
name) with respect to extension: if p, q € {0,1}* 
then N(p) is a prefix of N(p^q), where p^q denotes 
the concatenation of p and q. 

One of Solomonoff's fundamental results is that 
given a finite observed sequence x € {0,1}*, the 
most likely finite continuation is the one in which 
the concatenation of x and y is less complex in a 
Kolmogorov sense. This is formalized in the follow- 
ing result (see theorem 5.2.3 of [M]): for almost all 
infinite binary sequences X (in the sense of /i) we 
have 

— lim log /j,(y | X\n) = 

n— >oo 

lim Kmu((X\n)"y) - K mu (X\n) + 0(1) < oo. 

n— >oo 

Here, X\n represents the first n symbols of X, and 
Krnjj is the monotone Kolmogorov complexity rel- 
ative to a monotone universal machine U. That 
is, Kmu{x) is defined as the length of the shortest 
program p such that the output of U(p) starts with 
x — and possibly has a (finite or infinite) continua- 
tion. 

In other words, Solomonoff inductive inference 
leads to a method of prediction based on data com- 
pression, whose idea is that whenever the source 
has output the string x, it is a good heuristic to 
choose the extrapolation y of x that minimizes 
Kmu{x^y). For instance, if one has observed £2, 
it is more likely for the continuation to be 1010 



050001-4 



Papers in Physics, vol. 5, art. 050001 (2013) / S Romano et al. 



rather than 0101, as the former can be succinctly 
described by a program like 

repeat 22 times: print '10'. (2) 

and the latter looks more difficult to describe; in- 
deed the shorter program describing it seems to be 
something like 

repeat 20 times: print '10'; (3) 
print '0101'. 

Intuitively, as program (T2J) is shorter than ©, 
X2~1010 is more probable than X2~0101. Hence, 
if we have seen xi , it seems to be a better strategy 
to predict 1. 

III. A framework for human 
thoughts 

The notion of thought is not well grounded. We 
lack an operative working definition and, as also 
happens with other terms in neuroscience (con- 
sciousness, self, ...), the word thought is highly pol- 
yscmic in common language. It may refer, for ex- 
ample, to a belief, to an idea or to the content of the 
conscious mind. Due to this difficulty, the mere no- 
tion of thought has not been a principal or directed 
object of study in neuroscience, although of course 
it is always present implicitly, vaguely, without a 
formal definition. 

Here we do not intend to elaborate an extensive 
review on the philosophical and biological concep- 
tions of thoughts (see [3D] for a good review on 
thoughts). Nor are we in a theoretical position 
to provide a full formal definition of a thought. 
Instead, we point to the key assumptions of our 
framework about the nature of thoughts. This 
accounts to defining constraints in the class of 
thoughts which we aim to describe. In other words, 
we do not claim to provide a general theory of hu- 
man thoughts (which is not amenable at this stage 
lacking a full definition of the class) but rather of a 
subset of thoughts which satisfy certain constraints 
defined below. 

For instance, E.B. Titchener and W. Wundt, the 
founders of structuralist school in psychology (seek- 
ing structure in the mind without evoking meta- 
physical conceptions, a tradition which we inherit 
and to which we adhere), believed that thoughts 



were images (there are not imageless thoughts) and 
hence can be broken down to elementary sensations 
[3D]. While we do not necessarily agree with this 
propositions (see Carey [3T] for more contemporary 
versions denying the sensory foundations of concep- 
tual knowledge), here we do not intend to explain 
all possible thoughts but rather a subset, a sim- 
pler class which — in agreement with the Wundt 
and Titchener — can be expressed in images. More 
precisely, we develop a theory which may account 
for Boole's [32] notion of thoughts as propositions 
and statements about the world which can be rep- 
resented symbolically. Hence, a first and crucial 
assumption of our framework is that thoughts are 
discrete. Elsewhere we have extensively discussed 
[53H39] how the human brain, whose architecture is 
quite different from Turing machines, can emerge in 
a form of computation which is discrete, symbolic 
and resembles Turing devices. 

Second, here we focus on the notion of "prop- 
less" mental activity, i.e., whatever (symbolic) com- 
putations can be carried out by humans without 
resorting to external aids such as paper, marbles, 
computers or books. This is done by actually 
asking participants to perform the task "in their 
heads" . Again, this is not intended to set a proposi- 
tion about the universality of human thoughts but, 
instead, a narrower set of thoughts which we con- 
ceive is theoretically addressable in this mathemat- 
ical framework. 

Summarizing: 

1 . We think we do not have a good mathematical 
(even philosophical) conception of thoughts, as 
mental structures, yet. 

2. Intuitively (and philosophically), we adhere 
to a materialistic and computable approach 
to thoughts. Broadly, one can think (to pic- 
ture, not to provide a formal framework) that 
thoughts are formations of the mind with 
certain stability which defines distinguishable 
clusters or objects pIUlfl2"] . 

3. While the set of such objects and the rules 
of their transitions may be of many different 
forms (analogous, parallel, unconscious, un- 
linked to sensory experience, non-linguistic, 
non-symbolic), here we work on a subset of 
thoughts, a class defined by Boole's attempt 
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to formalize thought as symbolic propositions 
about the world. 

4. This states — which may correspond to hu- 
man "conscious rational thoughts", the seed 
of Boole and Turing foundations [SUES] — are 
discrete and defined by symbols and poten- 
tially represented by a Turing device. 

5. We focus on an even narrower space of 
thoughts. Binary formations (right or left, zero 
or one) to focus on what kind of language bet- 
ter describes these transitions. This work can 
be naturally extended to understand discrete 
transitions in conceptual formations [4l?M45| . 

6. We concentrate on prop-less mental activity 
to understand limitations of the human mind 
when it does not have evident external support 
(paper, computer...) 

IV. Implementing a language 
of thought with Turing- 
computable complexity 

As explained in section llHt.l Kolmogorov complex- 
ity considers all possible computable compressors 
and assigns to a string x the length of the short- 
est of the corresponding compressions. This seems 
to be a perfect theory of compression but it has 
a drawback: the function Kjj is not computable, 
that is, there is no effective procedure to calculate 
Kjj(x) given x. 

On the other hand, the definition of randomness 
introduced in section IIH1.1 having very deep and 
intricate connections with algorithmic information 
and computability theories, is simply too strong to 
explain our own perception of randomness. To de- 
tect that x$ consists of the first twenty bits of n is 
incompatible with human patterns of thought. 

Hence, the intrinsic algorithms (or observed pat- 
terns) which make human sequences not random 
are too restricted to be accounted by a universal 
machine and may be better described by a specific 
machine. Furthermore, our hypothesis is that each 
person uses his own particular specific machine or 
algorithm to generate a random string. 

As a first step in this complicated enterprise, we 
propose to work with a specific language LT 2 C 2 
which meets the following requirements: 



• LT 2 C 2 must reflect some plausible features of 
our mental activity when finding succinct de- 
scriptions of words. For instance, finding rep- 
etitions in a sequence such as X2 seems to be 
something easy for our brain, but detecting nu- 
merical dependencies between its digits as in 
x% seems to be very unlikely. 

• LT 2 C 2 must be able to describe any string in 
{0, 1}*. This means that the map given by 
the induced machine A = Alt^c 2 must be 
surjective. 

• A must be simple enough so that Km — the 
Kolmogorov complexity relative to A — be- 
comes computable. This requirement clearly 
makes LT 2 C 2 Turing incomplete, but as we 
have seen before, this is consistent with human 
deviations from randomness. 

• The rate of compression given by Kn must be 
sensible for very short strings, since our exper- 
iments will produce such strings. For instance, 
the approach, followed in [46 , of using the 
size of the compressed file via general-purpose 
compressors like Lempel-Ziv based dictionary 
(gzip) or block based (bzip2) to approximate 
the Kolmogorov complexity does not work in 
our setting. This method works best for long 
files. 

• LT 2 C 2 should have certain degrees of freedom, 
which can be adjusted in order to approximate 
the specific machine that each individual fol- 
lows during the process of randomness gener- 
ation. 

We will not go into the details on how to codify 
the instructions of LT 2 C 2 into binary strings of A: 
for the sake of simplicity we take A" as a surjec- 
tive total mapping LT 2 C 2 -> {0,1}*. We restrict 
ourselves to describe the grammar and semantics 
of our proposed programming language LT 2 C 2 . It 
is basically an imperative language with only two 
classes of instructions: a sort of print i, which prints 
the bit i in the output; and a sort of repeat n times 
P, which for a fixed n £ N it repeats n times the 
program P. The former is simply represented as i 
and the latter as (P) n . 

Formally, we set the alphabet {0, 1, (, ),° , . . . , 9 } 
and define LT 2 C 2 over such alphabet with the fol- 
lowing grammar: 



050001-6 



Papers in Physics, vol. 5, art. 050001 (2013) / S Romano et al. 



P ::= e | | 1 | PP {P) n , 

where n > 1 is the decimal representation of n E 
N and e denotes the empty string. The semantics 
of LT 2 C 2 is given through the behavior of N as 
follows: 

N(i) = p forie{e,0,l} 
Ar(P x P 2 ) d =* N(P 1 )-N{P 2 ) 
N{{P) n ) = N(Py •■■~N(P). 

" v ' 

n times 

N is not universal, but every string x has a pro- 
gram in N which describes it: namely x itself. 
Furthermore, N is monotone in the sense that if 
p, q € LT 2 C 2 then N(p) is a prefix of N(p~q). In 
Table [TJ the first column shows some examples of 
iV-programs which compute 1001001001. 

program size 

1001001001 IcT 

(100) 2 1(0) 2 1 6.6 

(100) 3 1 4.5 

1((0) 2 1) 3 3.8 



Table 1: Some TV-descriptions of 1001001001 and 

its sizes for b = r = 1 



i. Kolmogorov complexity for LT 2 C 2 

The Kolmogorov complexity relative to N (and 
hence to the language LT 2 C 2 ) is defined as 

K N {x) = mm{\\p\\ : p G LT 2 C 2 , A(p) = x}, 

where ||p||, the size of a program p, is inductively 
defined as: 



operation and the repeat n times operation. In the 
sequel, we drop the subindex of and simply 
write K = f Kn . Table Q] shows some examples of 
the size of TV-programs when b = r = 1. Observe 
that for all x we have K(x) < \\x\\. 

It is not difficult to see that K(x) depends only 
on the values of K(y), where y is any nonempty 
and proper substring of x. Since || • || is computable 
in polynomial time, using dynamic programming 
one can calculate K(x) in polynomial time. This, 
of course, is a major difference with respect to the 
Kolmogorov complexity relative to a universal ma- 
chine, which is not computable. 

ii. From compression to prediction 

As one can imagine, the perfect universal prediction 
method described in section lllliii.l is, again, non- 
computable. We define a computable prediction 
algorithm based on Solomonoff's theory of induc- 
tive inference but using K, the Kolmogorov com- 
plexity relative to LT 2 C 2 , instead of Kmu (which 
depends on a universal machine). To predict the 
next symbol of x £ {0,1}*, we follow the idea de- 
scribed in section Ulllii. I amongst all extrapolations 
y of x we choose the one that minimizes K(x^y). 
If such y starts with 1, we predict 1, else we predict 
0. Since we cannot examine the infinitely many ex- 
trapolations, we restrict to those up to a fixed given 
length lp. Also, we do not take into account the 
whole x but only a suffix of length lp. Both lp and 
£p are parameters which control, respectively, how 
many extrapolation bits are examined {lp many 
Future bits) and how many bits of the tail of x {lp 
many Past bits) are considered. 

Let {0,1}™ (resp. {0,1}^") be the set of words 
over the binary alphabet {0,1} of length n (resp. 
at most n). Formally, the prediction method is as 
follows. Suppose x = Xx ■ ■ ■ x n (xi £ {0, 1}) is a 
string. The next symbol is determined as follows: 



lell = f if m < mi; 



||p|| = b for p e {0, 1} 

||PiP 2 || = ||Pl|| + |]P 2 || 
||(P) n || = r-logn+||P||. where for z e {0,1} 



Next(xi • • • x n ) = ^ 1 if mo > mi; 

g(x n -E p •■■x n ) otherwise. 



In the above definition, b € N, r € R are two param- 
eters that control the relative weight of the print mi = mm{K{x n -e P • • • x n i^y) : y G {0, 1}-^ F }, 
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and g : {0, l} lp -> {0, 1} is denned as g(z) = i if 
the number of occurrences of i in z is greater than 
the number of occurrences of 1 — i in z\ in case the 
number of occurrences of Is and 0s in z coincide 
then g(z) is defined as the last bit of z. 

V. Methods 

Thirty eight volunteers (mean age = 24) partici- 
pated in an experiment to examine the capacity of 
LT 2 C 2 to identify regularities in production of bi- 
nary sequences. Participants were asked to produce 
random sequences, without further instruction. 

All the participants were college students 
or graduates with programming experience and 
knowledge of the theoretical foundations of ran- 
domness and computability. This was intended to 
test these ideas in a hard sample where we did not 
expect typical errors which results from a misun- 
derstanding of chance. 

The experiment was divided in four blocks. In 
each block the participant pressed freely the left or 
right arrow 120 times. 

After each key press, the participant received a 
notification with a green square which progressively 
filled a line to indicate the participant the number 
of choices made. At the end of the block, partic- 
ipants were provided feedback of how many times 
the predictor method has correctly predicted their 
input. After this point, a new trial would start. 

38 participants performed 4 sequences, yielding a 
total of 152 sequences. 14 sequences were excluded 
from analysis because they had an extremely high 
level of predictability. Including these sequences 
would have actually improved all the scores re- 
ported here. 

The experiment was programmed 
in ActionScript and can be seen 
at |http : // gamesdata . laf his-server . exp . dc . uba 



A well-known result obtained in some investiga- 
tions on generation or perception of randomness in 
binary sequences is that people tend to increase the 
number of alternations of symbols with respect to 
the expected value [27| . Given a string x of length 
n with r runs, there are n — 1 transitions between 
successive symbols and the number of alternations 
between symbol types is r — 1. The probability of 
alternation of the string x is defined as 



P(x) : {0,1}^ 2 
P(x) 



[0,11 



r-l 
n-1 ' 



In our experiment, the average P(x) of participants 
was 0.51, very close to the expected probability of 
alternation of a random sequence which should be 
0.5. A t-test on the P(x) of the strings produced by 
participants, where the null hypothesis is that they 
are a random sample from a normal distribution 
with mean 0.5, shows that the hypothesis cannot be 
rejected as the p- value is 0.31 and the confidence in- 
terval on the mean is [0.49, 0.53]. This means that 
the probability of alternation is not a good mea- 
sure to distinguish participant's strings from ran- 
dom ones, or at least, that the participants in this 
very experiment can bypass this validation. 

Although the probability of alternation was close 
to the expected value in a random string, partici- 
pants tend to produce n-grams of length > 2 with 
probability distributions which are not equiprob- 
able (see Fig. [T|). Strings containing more alter- 
nations (like 1010, 0101, 010, 101) and 3- and 
4— runs have a higher frequency than expected by 
chance. This might be seen as an effort from par- 
ticipants to keep the probability of alternation close 
to 0.5 by compensating the excess of alternations 
with blocks of repetitions of the same symbol. 

SF/a^&tn^Biring human randomness with 
other random sources 



VI. Results 

i. Law of large numbers 

Any reasonable notion of randomness for strings on 
base 2 should imply Borel's normality, or the law 
of large numbers in the sense that if x € {0,l} n 
is random then the number of occurrences of any 
given string y in x divided by n should tend to 
2~\ y \, as n goes to infinity. 



We asked whether K, the Kolmogorov complexity 
relative to LT 2 C 2 defined in section IIV11.1 is able 
to detect and compress more patterns in strings 
generated by participants than in strings produced 
by other sources, which are considered random 
for many practical issues. In particular, we stud- 
ied strings originated by two sources: Pseudo- 
Random Number Generator (PRNG) and Atmo- 
spheric Noise (AN). 
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11 
10 
01 
00 



111 - 

110 - 
101 - 
100 - 
Oil - 
010 - 
001 - 
000 - 



1111 - 

1110 
1101 - 
1100 - 
1011 - 
1010 I 
1001 I 
1000 I 
0111 -| 
0110 -j 
0101 j 
0100 -j 
0011 - 
0010 - 
0001 - 
0000 - 



Figure 1: Frequency of sub-strings up to length 4 



For the PRNG source, we chose the Mersenne 
Twister algorithm [47 (specifically, the second re- 
vision from 2002 that is currently implemented in 
GNU Scientific Library). The atmospheric noise 
was taken from random, org site (property of Ran- 
domness and Integrity Services Limited) which also 
runs real-time statistic tests recommended by the 
US National Institute of Standards and Technol- 
ogy to ensure the random quality of the numbers 
produced over time. 

In Tabled we summarize our results using 6=1 
and r = 1 for the parameters of K as defined in 
section IIVH.I 





Participants 


PRNG 


AN 


Mean /z 


48.43 


52.99 


53.88 


Std a 


6.62 


3.06 


2.87 


I s * quartile 


45.30 


50.42 


51.88 


Median 


49.23 


53.15 


53.85 


3 rd quartile 


51.79 


55.21 


55.79 



Table 2: Values of K(x), where a; is a string pro- 
duced by participants, PRNG or AN sources 



The mean and median of K increases when 
comparing participant's string with PRNG or AN 
strings. This difference was significant, as con- 
firmed by a t-test (p- value of 4.9 x 10~ n when 
comparing participant's sample with PRNG one, 
a p- value of 1.2 x 10~ 15 when comparing partici- 
pant's with AN and a p- value of 1.4 x 10 -2 when 
comparing PRNG with AN sample). 

Therefore, despite the simplicity of LT 2 C 2 , based 
merely on prints and repeats, it is rich enough to 
identify regularities of human sequences. The K 
function relative to LT 2 C 2 is an effective and sig- 
nificant measure to distinguish strings produced by 
participants with profound understanding in the 
mathematics of randomness, from PRNG and AN 
strings. As expected, humans produce less complex 
(i.e., less random) strings than those produced by 
PRNG or atmospheric noise sources. 

iii. Mental fatigue 

On cognitively demanding tasks, fatigue affects 
performance by deteriorating the capacity to or- 
ganize behavior [48-52 j. Specifically, Weiss claim 
that boredom may be a factor that increases non- 
randomness [15]. Hence, as another test to the abil- 
ity of K relative to LT 2 C 2 to identify idiosyncratic 
elements of human regularities, we asked whether 
the random quality of the participant's string dete- 
riorated with time. 

For each of the 138 strings x = x\- ■ ■ x\2q (a;, <E 
{0, 1}) produced by the participants, we measured 
the K complexity of all the sub-strings of length 30. 

Specifically, we calculated the average 
K{xi • ■ • Xi+30) from the 138 strings for each 
i E [0,90] (see Fig. [5]), using the same parameters 
as in section IVIJti.l (b = r = 1), and compared 
to the same sliding average procedure for PRNG 
(Fig. El) and AN sources (Fig. HJ) - 

The sole source which showed a significant linear 
regression was human generated data (see Table [3]) 
which, as expected, showed a negative correlation 
indicating that participants produced less complex 
or random strings over time (slope —0.007, p < 
0.02). 

The finding of a fatigue-related effect shows that 
the unpropped, i.e., resource-limited, human Tur- 
ing machine is not only limited in terms of the lan- 
guage it can parse, but also in terms of the amount 
of time it can dedicate to a particular task. 
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12.6 - 

i i i i 

20 40 60 80 

i 

Figure 2: Average of K(xi ■ ■ ■ Xi+30) for partici- 
pants 

14.4 - 




13.6 - 

1 1 1 1 

20 40 60 80 

i 

Figure 3: Average of K{x{ ■ ■ ■ for PRNG 

iv. Predictability 

In section llVlU.l we introduced a prediction method 
with two parameters: £p and £p. A predictor based 
on LT 2 C 2 achieved levels of predictability close to 
56% which were highly significant (see Table |4]). 
The predictor, as expected, performed at chance 
for the control PRNG and AN data. This fit was 
relatively insensitive to the values of lp and If, 




20 40 60 80 

i 



Figure 4: Average of K{xi ■ ■ ■ Xi+zo) for AN 



Participants PRNG AN 

Mean slope -0.007 0.0016 -0.0005 

p-value 0.02 0.5 0.8 

_CI [-0.01,-0.001] [-0.003,0.006] [-0.005,0.004] 

Table 3: Predictability 



contrary to our intuition that there may be a mem- 
ory scale which would correspond in this framework 
to a given length. 

A very important aspect of this investigation, in 
line with the prior work of |23j , is to inquire whether 
specific parameters are stable for a given individual. 
To this aim, we optimized, for each participant, 
the parameters using the first 80 symbols of the 
sequence and then tested these parameters in the 
second half of each segment (last 80 symbols of the 
sequence) 

After this optimization procedure, mean pre- 
dictability increased significantly to 58.14% (p < 
0.002, see Table 0). As expected, the optimization 
based on partial data of PRNG and AN resulted in 
no improvement in the classifier, which remained 
at chance with no significant difference (p < 0.3, 
p < 0.2, respectively). 

Hence, while the specific parameters for compres- 
sion vary widely across each individual, they show 
stability in the time-scale of this experiment. 





Participants 


PRNG 


AN 


Mean fj, 


56.16 


50.69 


49.48 


Std a 


0.07 


0.02 


0.02 


1 st quartile 


49.97 


48.84 


48.30 


Median 


55.02 


50.77 


49.04 


3 rd quartile 


59.75 


52.21 


50.46 


Table 4: 


Average predictability 




Participants 


PRNG 


AN 


Mean fj, 


58.14 


51.20 


49.01 


Std a 


0.07 


0.04 


0.03 


1 st quartile 


52.88 


48.56 


47.11 


Median 


56.73 


50.72 


49.28 


3 rd quartile 


62.02 


53.85 


50.48 



Table 5: Optimized predictability 
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VII. Discussion 

Here we analyzed strings produced by participants 
attempting to generate random strings. Partici- 
pants had a profound understanding of randomness 
and hence avoided typical misconceptions such as 
exaggerating the number of alternations. We rea- 
soned that remaining regularities would express the 
algorithmic nature of human thoughts, revealed in 
the form of specific patterns. 

Our effort here was to bridge the gap between 
Kolmogorov theory and psychology, developing a 
concrete language, LT 2 C 2 , satisfying the follow- 
ing requirements: 1) to be simple enough so that 
the complexity of any given sequence can be com- 
puted, 2) to be based on tangible operations of hu- 
man reasoning ( printing, repeating, . . . ) , 3) to 
be sufficiently powerful to generate all possible se- 
quences but not too powerful as to identify regu- 
larities which would be invisible to humans. 

More specifically, our aim is to develop a class 
of languages with certain degrees of freedom which 
can then be fit to an individual (or an individual 
in a specific context and time). Here, we opted for 
a comparably easier strategy by only allowing the 
relative cost of each operation to vary. However, a 
natural extension of this framework is to generate 
classes of languages where structural and qualita- 
tive aspects of the language are free to vary. For 
instance, one can devise a program structure for 
repeating portions of (not necessarily neighboring) 
code, or considering the more general framework of 
for-programs where the repetitions are more gen- 
eral than in our setting: for i=l to n do P(i), where 
P is a program that uses the successive values of 
i = 1,2, ... ,n in each iteration. For instance, the 
following program 

for i=l to 6 do 
print '0' 

repeat i times: print '1 ' 

would describe the string 

010110111011110111110111111. 

The challenge from the computational theoretical 
point of view is to define an extension which induces 
a computable (even more, feasible, whenever possi- 
ble) Kolmogorov complexity. For instance, adding 
simple control structures like conditional jumps or 



allowing the use of imperative program variables 
may turn the language into Turing-complete, with 
the theoretical consequences that we already men- 
tioned. The aim is to keep the language simple and 
yet include structures to compact some patterns 
which are compatible with the human language of 
thought. 

We emphasize that our aim here was not to gen- 
erate an optimal predictor of human sequences. 
Clearly, restricting LT 2 C 2 to a very rudimentary 
language is not the way to go to identify vast classes 
of patterns. Our goal, instead, was to use human 
sequences to calibrate a language which expresses 
and captures specific patterns of human thought in 
a tangible and concrete way. 

Our model is based on ideas from Kolmogorov 
complexity and Solomonoff's induction. It is im- 
portant to compare it to what we think is the clos- 
est and more similar approach in previous stud- 
ies: the work [23 of Griffiths and Tenenbaum's. 
Griffiths and Tenenbaum devise a series of statisti- 
cal models that account for different kind of reg- 
ularities. Each model Z is fixed and assigns to 
every binary string x a probability Pz{x). This 
probabilistic approach is connected to Kolmogorov 
complexity theory via Levin's famous Coding The- 
orem, which points out a remarkably numerical re- 
lation between the algorithmic probability Pjj(x) 
(the probability that the universal prefix Turing 
machine U outputs x when the input is filled-up 
with the results of coin tosses) and the (prefix) Kol- 
mogorov complexity Kjj described in section II III. I 
Formally, the theorem states that there is a con- 
stant c such that for any string x S {0, 1}* such 
that 

\- log Pu(x)-Ku(x)\ <c (4) 

(the reader is referred to section 4.3.4 of [53] for 
more details). Griffiths & Tenenbaum's bridge to 
Kolmogorov complexity is only established through 
this last theoretical result: replacing Pjj by Pz in 
Eq. ((3]) should automatically give us some Kol- 
mogorov complexity Kz with respect to some un- 
derlying Turing machine Z . 

While there is hence a formal relation to Kol- 
mogorov complexity, there is no explicit definition 
of the underlying machine, and hence no notion of 
program. 

On the contrary, we propose a specific language 
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of thought, formalized as the programming lan- 
guage LT 2 C 2 or, alternatively, as a Turing ma- 
chine N, which assigns formal semantics to each 
program. Semantics are given, precisely, through 
the behavior of N. The fundamental introduc- 
tion of program semantics and the clear distinc- 
tion between inputs (programs of N) and outputs 
(binary strings) allows us to give a straightfor- 
ward definition of Kolmogorov complexity relative 
to N, denoted Km, which — because of the choice of 
LT 2 C 2 — becomes computable in polynomial time. 
Once we count with a complexity function, we ap- 
ply Solomonoff 's ideas of inductive inference to ob- 
tain a predictor which tries to guess the continu- 
ation of a given string under the assumption that 
the most probable one is the most compressible in 
terms of LT 2 C 2 -Kolmogorov complexity. As in [2"3"] . 
we also make use of the Coding Theorem (Q| , but in 
the opposite direction: given the complexity Km, 
we derive an algorithmic probability Pjy. 

This work is mainly a theoretical development, 
to develop a framework to adapt Kolmogorov ideas 
in a constructive procedure (i.e., defining an ex- 
plicit language) to identify regularities in human se- 
quences. The theory was validated experimentally, 
as three tests were satisfied: 1) human sequences 
were less complex than control PRNG sequences, 
2) human sequences were non-stationary, showing 
decreasing values of complexity, 3) each individual 
showed traces of algorithmic stability since fitting 
of partial data was more effective to predict sub- 
sequent data than average fits. Our hope is that 
this theory may constitute, in the future, a useful 
framework to ground and describe the patterns of 
human thoughts. 
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